Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute

Identifieur interne : 000690 ( Main/Exploration ); précédent : 000689; suivant : 000691

Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute

Auteurs : Gen-Tao Chiang [Royaume-Uni] ; Peter Clapham [Royaume-Uni] ; Guoying Qi [Royaume-Uni] ; Kevin Sale [Royaume-Uni] ; Guy Coates [Royaume-Uni]

Source :

RBID : PMC:3228552

Abstract

Background

Increasingly large amounts of DNA sequencing data are being generated within the Wellcome Trust Sanger Institute (WTSI). The traditional file system struggles to handle these increasing amounts of sequence data. A good data management system therefore needs to be implemented and integrated into the current WTSI infrastructure. Such a system enables good management of the IT infrastructure of the sequencing pipeline and allows biologists to track their data.

Results

We have chosen a data grid system, iRODS (Rule-Oriented Data management systems), to act as the data management system for the WTSI. iRODS provides a rule-based system management approach which makes data replication much easier and provides extra data protection. Unlike the metadata provided by traditional file systems, the metadata system of iRODS is comprehensive and allows users to customize their own application level metadata. Users and IT experts in the WTSI can then query the metadata to find and track data.

The aim of this paper is to describe how we designed and used (from both system and user viewpoints) iRODS as a data management system. Details are given about the problems faced and the solutions found when iRODS was implemented. A simple use case describing how users within the WTSI use iRODS is also introduced.

Conclusions

iRODS has been implemented and works as the production system for the sequencing pipeline of the WTSI. Both biologists and IT experts can now track and manage data, which could not previously be achieved. This novel approach allows biologists to define their own metadata and query the genomic data using those metadata.


Url:
DOI: 10.1186/1471-2105-12-361
PubMed: 21906284
PubMed Central: 3228552


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute</title>
<author>
<name sortKey="Chiang, Gen Tao" sort="Chiang, Gen Tao" uniqKey="Chiang G" first="Gen-Tao" last="Chiang">Gen-Tao Chiang</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA</wicri:regionArea>
<wicri:noRegion>CB10 1SA</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Clapham, Peter" sort="Clapham, Peter" uniqKey="Clapham P" first="Peter" last="Clapham">Peter Clapham</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA</wicri:regionArea>
<wicri:noRegion>CB10 1SA</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Qi, Guoying" sort="Qi, Guoying" uniqKey="Qi G" first="Guoying" last="Qi">Guoying Qi</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Wellcome Trust Sanger Institute, New Sequencing Technologies, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Wellcome Trust Sanger Institute, New Sequencing Technologies, Wellcome Trust Genome Campus, Hinxton, CB10 1SA</wicri:regionArea>
<wicri:noRegion>CB10 1SA</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Sale, Kevin" sort="Sale, Kevin" uniqKey="Sale K" first="Kevin" last="Sale">Kevin Sale</name>
<affiliation wicri:level="1">
<nlm:aff id="I3">Wellcome Trust Sanger Institute, Infrastructure Management Team, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Wellcome Trust Sanger Institute, Infrastructure Management Team, Wellcome Trust Genome Campus, Hinxton, CB10 1SA</wicri:regionArea>
<wicri:noRegion>CB10 1SA</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Coates, Guy" sort="Coates, Guy" uniqKey="Coates G" first="Guy" last="Coates">Guy Coates</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA</wicri:regionArea>
<wicri:noRegion>CB10 1SA</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">21906284</idno>
<idno type="pmc">3228552</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228552</idno>
<idno type="RBID">PMC:3228552</idno>
<idno type="doi">10.1186/1471-2105-12-361</idno>
<date when="2011">2011</date>
<idno type="wicri:Area/Pmc/Corpus">000263</idno>
<idno type="wicri:Area/Pmc/Curation">000263</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000507</idno>
<idno type="wicri:Area/Ncbi/Merge">000260</idno>
<idno type="wicri:Area/Ncbi/Curation">000260</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000260</idno>
<idno type="wicri:Area/Main/Merge">000692</idno>
<idno type="wicri:Area/Main/Curation">000690</idno>
<idno type="wicri:Area/Main/Exploration">000690</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute</title>
<author>
<name sortKey="Chiang, Gen Tao" sort="Chiang, Gen Tao" uniqKey="Chiang G" first="Gen-Tao" last="Chiang">Gen-Tao Chiang</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA</wicri:regionArea>
<wicri:noRegion>CB10 1SA</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Clapham, Peter" sort="Clapham, Peter" uniqKey="Clapham P" first="Peter" last="Clapham">Peter Clapham</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA</wicri:regionArea>
<wicri:noRegion>CB10 1SA</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Qi, Guoying" sort="Qi, Guoying" uniqKey="Qi G" first="Guoying" last="Qi">Guoying Qi</name>
<affiliation wicri:level="1">
<nlm:aff id="I2">Wellcome Trust Sanger Institute, New Sequencing Technologies, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Wellcome Trust Sanger Institute, New Sequencing Technologies, Wellcome Trust Genome Campus, Hinxton, CB10 1SA</wicri:regionArea>
<wicri:noRegion>CB10 1SA</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Sale, Kevin" sort="Sale, Kevin" uniqKey="Sale K" first="Kevin" last="Sale">Kevin Sale</name>
<affiliation wicri:level="1">
<nlm:aff id="I3">Wellcome Trust Sanger Institute, Infrastructure Management Team, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Wellcome Trust Sanger Institute, Infrastructure Management Team, Wellcome Trust Genome Campus, Hinxton, CB10 1SA</wicri:regionArea>
<wicri:noRegion>CB10 1SA</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Coates, Guy" sort="Coates, Guy" uniqKey="Coates G" first="Guy" last="Coates">Guy Coates</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, UK</nlm:aff>
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Wellcome Trust Sanger Institute, Informatics System Group, Wellcome Trust Genome Campus, Hinxton, CB10 1SA</wicri:regionArea>
<wicri:noRegion>CB10 1SA</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2011">2011</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>Increasingly large amounts of DNA sequencing data are being generated within the Wellcome Trust Sanger Institute (WTSI). The traditional file system struggles to handle these increasing amounts of sequence data. A good data management system therefore needs to be implemented and integrated into the current WTSI infrastructure. Such a system enables good management of the IT infrastructure of the sequencing pipeline and allows biologists to track their data.</p>
</sec>
<sec>
<title>Results</title>
<p>We have chosen a data grid system, iRODS (Rule-Oriented Data management systems), to act as the data management system for the WTSI. iRODS provides a rule-based system management approach which makes data replication much easier and provides extra data protection. Unlike the metadata provided by traditional file systems, the metadata system of iRODS is comprehensive and allows users to customize their own application level metadata. Users and IT experts in the WTSI can then query the metadata to find and track data.</p>
<p>The aim of this paper is to describe how we designed and used (from both system and user viewpoints) iRODS as a data management system. Details are given about the problems faced and the solutions found when iRODS was implemented. A simple use case describing how users within the WTSI use iRODS is also introduced.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>iRODS has been implemented and works as the production system for the sequencing pipeline of the WTSI. Both biologists and IT experts can now track and manage data, which could not previously be achieved. This novel approach allows biologists to define their own metadata and query the genomic data using those metadata.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Mardis, Er" uniqKey="Mardis E">ER Mardis</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Cuff, Jj" uniqKey="Cuff J">JJ Cuff</name>
</author>
<author>
<name sortKey="Coates, G" uniqKey="Coates G">G Coates</name>
</author>
<author>
<name sortKey="Cutts, T" uniqKey="Cutts T">T Cutts</name>
</author>
<author>
<name sortKey="Rae, M" uniqKey="Rae M">M Rae</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Schmuck, F" uniqKey="Schmuck F">F Schmuck</name>
</author>
<author>
<name sortKey="Roger, H" uniqKey="Roger H">H Roger</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bell, G" uniqKey="Bell G">G Bell</name>
</author>
<author>
<name sortKey="Hey, T" uniqKey="Hey T">T Hey</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chervenak, A" uniqKey="Chervenak A">A Chervenak</name>
</author>
<author>
<name sortKey="Foster, I" uniqKey="Foster I">I Foster</name>
</author>
<author>
<name sortKey="Kesselman, C" uniqKey="Kesselman C">C Kesselman</name>
</author>
<author>
<name sortKey="Salisbury, C" uniqKey="Salisbury C">C Salisbury</name>
</author>
<author>
<name sortKey="Tuecke, S" uniqKey="Tuecke S">S Tuecke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Baru, C" uniqKey="Baru C">C Baru</name>
</author>
<author>
<name sortKey="Moore, R" uniqKey="Moore R">R Moore</name>
</author>
<author>
<name sortKey="Rajasekar, A" uniqKey="Rajasekar A">A Rajasekar</name>
</author>
<author>
<name sortKey="Wan, M" uniqKey="Wan M">M Wan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Hedges, M" uniqKey="Hedges M">M Hedges</name>
</author>
<author>
<name sortKey="Blanke, T" uniqKey="Blanke T">T Blanke</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rajasekar, A" uniqKey="Rajasekar A">A Rajasekar</name>
</author>
<author>
<name sortKey="Moore, R" uniqKey="Moore R">R Moore</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Saljea, Ekh" uniqKey="Saljea E">EKH Saljea</name>
</author>
<author>
<name sortKey="Artachoa, E" uniqKey="Artachoa E">E Artachoa</name>
</author>
<author>
<name sortKey="Austen, Kf" uniqKey="Austen K">KF Austen</name>
</author>
<author>
<name sortKey="Bruin, Rp" uniqKey="Bruin R">RP Bruin</name>
</author>
<author>
<name sortKey="Calleja, M" uniqKey="Calleja M">M Calleja</name>
</author>
<author>
<name sortKey="Chappell, H" uniqKey="Chappell H">H Chappell</name>
</author>
<author>
<name sortKey="Chiang, G T" uniqKey="Chiang G">G-T Chiang</name>
</author>
<author>
<name sortKey="Dove, Mt" uniqKey="Dove M">MT Dove</name>
</author>
<author>
<name sortKey="Frame, I" uniqKey="Frame I">I Frame</name>
</author>
<author>
<name sortKey="Goodwin, A" uniqKey="Goodwin A">A Goodwin</name>
</author>
<author>
<name sortKey="Kleese Van Damc, K" uniqKey="Kleese Van Damc K">K Kleese van Damc</name>
</author>
<author>
<name sortKey="Marmierd, A" uniqKey="Marmierd A">A Marmierd</name>
</author>
<author>
<name sortKey="Parker, Sc" uniqKey="Parker S">SC Parker</name>
</author>
<author>
<name sortKey="Pruneda, M" uniqKey="Pruneda M">M Pruneda</name>
</author>
<author>
<name sortKey="Todorovac, It" uniqKey="Todorovac I">IT Todorovac</name>
</author>
<author>
<name sortKey="Trachenko, K" uniqKey="Trachenko K">K Trachenko</name>
</author>
<author>
<name sortKey="Tyer, R" uniqKey="Tyer R">R Tyer</name>
</author>
<author>
<name sortKey="White, Toh" uniqKey="White T">TOH White</name>
</author>
<author>
<name sortKey="Walker, Am" uniqKey="Walker A">AM Walker</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, H" uniqKey="Li H">H Li</name>
</author>
<author>
<name sortKey="Handsaker, B" uniqKey="Handsaker B">B Handsaker</name>
</author>
<author>
<name sortKey="Wysoker, A" uniqKey="Wysoker A">A Wysoker</name>
</author>
<author>
<name sortKey="Fennell, T" uniqKey="Fennell T">T Fennell</name>
</author>
<author>
<name sortKey="Ruan, J" uniqKey="Ruan J">J Ruan</name>
</author>
<author>
<name sortKey="Homer, N" uniqKey="Homer N">N Homer</name>
</author>
<author>
<name sortKey="Marth, G" uniqKey="Marth G">G Marth</name>
</author>
<author>
<name sortKey="Abecasis, G" uniqKey="Abecasis G">G Abecasis</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Jordan, C" uniqKey="Jordan C">C Jordan</name>
</author>
<author>
<name sortKey="Stanzione, D" uniqKey="Stanzione D">D Stanzione</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Basney, J" uniqKey="Basney J">J Basney</name>
</author>
<author>
<name sortKey="Humphrey, M" uniqKey="Humphrey M">M Humphrey</name>
</author>
<author>
<name sortKey="Welch, V" uniqKey="Welch V">V Welch</name>
</author>
</analytic>
</biblStruct>
<biblStruct></biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chiang, G T" uniqKey="Chiang G">G-T Chiang</name>
</author>
<author>
<name sortKey="Dove, Mt" uniqKey="Dove M">MT Dove</name>
</author>
<author>
<name sortKey="Bovolo, I" uniqKey="Bovolo I">I Bovolo</name>
</author>
<author>
<name sortKey="Ewen, J" uniqKey="Ewen J">J Ewen</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chiang, G T" uniqKey="Chiang G">G-T Chiang</name>
</author>
<author>
<name sortKey="White, Toh" uniqKey="White T">TOH White</name>
</author>
<author>
<name sortKey="Bovolo, I" uniqKey="Bovolo I">I Bovolo</name>
</author>
<author>
<name sortKey="Ewen, J" uniqKey="Ewen J">J Ewen</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list>
<country>
<li>Royaume-Uni</li>
</country>
</list>
<tree>
<country name="Royaume-Uni">
<noRegion>
<name sortKey="Chiang, Gen Tao" sort="Chiang, Gen Tao" uniqKey="Chiang G" first="Gen-Tao" last="Chiang">Gen-Tao Chiang</name>
</noRegion>
<name sortKey="Clapham, Peter" sort="Clapham, Peter" uniqKey="Clapham P" first="Peter" last="Clapham">Peter Clapham</name>
<name sortKey="Coates, Guy" sort="Coates, Guy" uniqKey="Coates G" first="Guy" last="Coates">Guy Coates</name>
<name sortKey="Qi, Guoying" sort="Qi, Guoying" uniqKey="Qi G" first="Guoying" last="Qi">Guoying Qi</name>
<name sortKey="Sale, Kevin" sort="Sale, Kevin" uniqKey="Sale K" first="Kevin" last="Sale">Kevin Sale</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000690 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000690 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:3228552
   |texte=   Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:21906284" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024